Inference of kinetic Ising model on sparse graphs 
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Based on dynamical cavity method, we propose an approach to the inference of kinetic Ising 
model, which asks to reconstruct couplings and external fields from given time-dependent output 
of original system. Our approach gives an exact result on tree graphs and a good approximation 
on sparse graphs, it can be seen as an extension of Belief Propagation inference of static Ising 
model to kinetic Ising model. While existing mean field methods to the kinetic Ising inference 
e.g., naive mean-field, TAP equation and simply mean- field, use approximations which calculate 
magnetizations and correlations at time t from statistics of data at time t — 1, dynamical cavity 
method can use statistics of data at times earlier than t — 1 to capture more correlations at different 
time steps. Extensive numerical experiments show that our inference method is superior to existing 
mean-field approaches on diluted networks. 
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I. INTRODUCTION 



Inference problem, which aims at learning internal structure of complex systems from empirical output data, has 
been studied for a long time. Among many models, inverse Ising model, which is a basic pair-wise model in physics, 
has been widely studied for inference problem in many fields. 



external fields of Ising model from configurations sampled from Boltzmann distribution. Efficient algorithms based 
on mean-field approximations and message passing have been developed to address the inference task. Recently it 
receives special attention in neural network reconstruction of retinas based on multi-electrode recordings[22], and 
gene-gene interaction reconstruction[19]. In static inverse Ising model, empirical data are M equilibrium configura- 
tions {<Ti, <T2, • • • , ctm} that sampled from Boltzmann distribution V = -^e^ H ^ , where energy H (<r) is a function of 
couplings and external fields. When M is very large, posterior distribution of H peaks on the maximum point so that 
one can find the couplings by maximizing the log-likelihood £ = — (H) — logZ, where (H) denotes averaged energy. 
Thanks to the convexity of log-likelihood, the exact method, so called Boltzmann machine, computes exactly the 
correlations and magnetizations and match them with empirical ones. However, computing magnetizations and corre- 
lations exactly for large systems is unreachable, so many efforts have been paid to study approximate approaches based 
on different approximations, including naive mean-field approximation, TAP equation, small correlation expansion 
and message passing schemes. 

Boltzmann distribution is a strong restriction to system to be studied. Many real systems like biological systems, 
especially with time-dependent external stimuli, do not have this good property. Inverse kinetic Ising model, which 
asks to reconstruct couplings and time-dependent external fields from dynamic output of real system, obviously 
meets wider needs in real system inference. In inverse kinetic Ising model, exact reconstruction uses all empirical 
configurations in learning and time-complexity is proportional to the number of configurations. When number of 
configurations is large, one needs efficient approximate methods. In last years, Naive Mean-field and TAP approaches 
have been adapted from static Ising model in case of weak couplings [23|]. On densely connected fully asymmetric 
networks, an exact reconstruction method is proposed for asymmetric SK mo del [2~H. [25j. 



However, biological systems are often sparsely connected, so it should be valuable to adapt message passing [26( 
method which work well in sparse-connected static Ising model inference into kinetic Ising model inference. Another 
problem of existing mean-field methods is that they use approximations which predict quantities of time t + 1 based 
on time t, we term these approximations one-step approximations. There are two cases that one-step approximations 
works well. In first case, experimental data are drawn from stationary state where t — > co and in second case there 
is no feedback loops in system e.g. the fully asymmetric model. But in practice it is difficult to ensure these two 
conditions, so it should be useful to develop a method that use statistics at not only one time step before, but earlier 
time steps to do the construction. In what follows, we show how to use Bethe approximation in inference of kinetic 
Ising model, in context of dynamic cavity method. This can be seen as an extension of Belief-Propagation inference 
in static inverse Ising problems [26j. 

The paper is organized as follows. Firstly, in Sec. [Hj we give definition of kinetic Ising model. In Sec. IIIII we 
introduce exact inference and existing mean-field approaches. In Sec. IIVI we discuss dynamical cavity methods and 
inference of kinetic Ising model. Sec. [V] contains some numerical results to compare performance of inference by 
dynamical cavity method with other mean-field approximations. The last section contains conclusions and some 
discussions. 



We consider random graphs with N vertices and average connectivity C. In this paper we focus on sparse graphs 
that C <C N. Connections of graphs are defined by matrix cy = 0, 1. cy = 1 means there is a directed edge from 
vertex i to vertex j and cy = means there is no such edge. If cy = Cji = 1, the edge between i and j is symmetric, 
otherwise it is asymmetric. We set each edge to be symmetric with probability P s and asymmetric with probability 
1 — P s . Obviously, with P s = 1, graph is undirected and with P s = 0, graph is fully asymmetric. 

On each vertex i there is a spin Ui = {+1, —1} associated, and on each edge j —> i there is a coupling Jy associated. 
If Cji = 0, coupling Jij is set to otherwise value of Jy is generated randomly according to Gaussian distribution 
with zero mean and variance J/y/C. Dynamics of spins are defined by configurations at different discrete time steps 
from time to time T: ct(0), . . . ,ct(T). At each time step, value of spin i is updated based on configuration of last 
time according to 





II. KINETIC ISING MODEL 



(1) 
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where j3 denotes the inverse temperature and local field at time t is expressed as hi(t) = Yljedi ^ij a j (*) + ^i(*)> with 
9i(t) denoting external fields acting on spin i at time t. We consider in this paper the parallel dynamics, that all spins 
updated synchronously, evolution of configuration is written as 

N 

W\a (t + 1)\<t (*)] = J] Wfa (t + l)\a (t)]. (2) 
i=i 

We argue that it would be not difficult to extend our approach from parallel dynamics to sequential dynamics following 
what has been done for mean-field methods in [27}. 

Direct problem of kinetic Ising model asks to predict dynamical behavior of spins at an arbitrary time t, given 
couplings, external fields and initial state of network. Calculating full distribution of system P(cr(t)) is a very 
difficult task, so most theories focused on computing macroscopic observables e.g. magnetization at a time m(t) — 
Eo- ( (t) <7 i(*)p( cr i(*)) an d correlation at different times Cij(t,t') = ^(tj^ ff) cr i(t) cr j(t')p{o'i(t)i "^'(O)- This direct 
problem have been studied for long time especially in the context of attractor neural networks [28i l29j . Many approaches 
has been proposed to study ensemble averaged macroscopic quantities, e.g. dynamical replica method[3(|, generating 
functional methodpjlj and dynamical cavity method[32l|. 

Inverse problem of kinetic Ising model is the problem we would like to study in this paper. Unlike direct problem, 
instead of couplings and external fields, experimental output of original model (set of configurations) is given, one 
is asked to infer couplings and external fields from those experimental data. The experimental data are obtained 
by running parallel dynamics of Ising model according to eq.( [1} on graphs for R realizations of time length T 
paths(trajectories), denoted by {of (t)} with r £ and t € [1,T]. In this paper, we set T to 10. 



III. EXACT RECONSTRUCTION AND MEAN-FIELD APPROXIMATIONS 



Given experimental configurations, probability of observing these samplings as a function of couplings is written 



as: 



R T-l 



P(J ij ;6 i (t)) = l[ H W[v(t + l)\<r(t)] 



r=l t=l 
R T-l N 

= n n n ° x p ^< (* + *) % w - b g 2 ^^m)} 

r=l t=l i=l 

Then, log-likelihood of observing data is defined as: 
£ (Jy ;<?;(*))= log (P(Jy;0;(t))) 



(3) 
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EE 
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pj2 °i (* + 1) ( E J ^ ^ + - E lo s 2 cosh ^ I E J « a i ( f ) + ^(*) 

jddi J i=l \j£di 



i=l 



(4) 



To find the most-likely couplings and external fields, one needs to maximize the log-likelihood. At the maximum point 
of log-likelihood, by setting derivatives of C with respect to Jij and 9i to zero, one has following equations: 



«(* + !)> j 



(a[(t + l)a;(t)>. 



/ tanh/3 

\ \kedi 



^2 Jlk(J k (*) 






Jik(T r k (*) + 0i(f) 



(5) 



(6) 



T' R 



where (-) T denotes taking average over time and (•) R denotes taking average over realizations. 

Let us use mf ata (t) = (of (t)) R and cf°- ta (t,t — 1) = ^(cr[ (t) crj (t) — l) T ^ to denote experimental magnetization 

at time t and correlation at time t and t — 1 respectively, and use mf(t) and (i + 1, t) to denote magnetization and 
correlations predicted by Ising model, eqs. (O,© show that at max-likelihood point, mf(t + 1) and c^(t+ l,t) are 
calculated using configurations at time t, so we term it configurations based one-step reconstruction. The reconstruction 
can be carried out by using gradient descent learning starting from an initial couplings and fields: 

9 i = e l + v ({ m ^(t)) t -{ m I(t)) t ) (7) 



J iJ - + »?((t# ta (t + i s t)> -<4(t + i,i))j 



(8) 
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where constant r/ is learning rate. Note that in above scheme, one has to scan all configurations to compute the 
average on every learning step. It is time consuming and not realistic when amount of experimental data is huge. To 
overcome this computational problem, several mean-field approaches, e.g. Naive mean-field, TAP and simply mean- 
field method, are proposed. They use different approximations to compute macroscopic observables, say average 
magnetizations and correlations, and use those macroscopic observables, instead of all configurations, to do inference. 
Naive mean-field method[23] applies naive mean-field approximation 



rn((t + 1) = tanh 



0^2 Jam 



and compute one-step delayed correlation as 



\{t + !,<) = mj{t)m{(t + 1) + (1 - (m; 7 (< + l)) 2 ) £ J lk [c kj (t,t) - m k (t)m 3 {t)] . 



(9) 



(10) 



As an extension of naive mean-field, TAP method takes the expansion on couplings and outperforms naive TAP 
method in case of weak couplings, the approximation is written as: 



m-'(t + 1) = tanh 



P £ J^m 3 (t) - m{{t + 1) £ 4(1 - m 2 j{t)) 

jedi j 



(11) 



and delayed correlation is expressed as 

4(t + l,t) = mj {t)m({t + 1) + [1 - [m{{t + l)) 2 ] p 



l-[l-K 7 (t+l)) 2 ]^J 2 (l-m 2 (t) 



(12) 



Another recently proposed approach, simply Mean- field method \24. l25|. which assumes Gaussian distribution of the 
local field of each spin, computes magnetization as 



m{{t+l) 



J ge* 2 / 2 tanh/3 ( £ J ijmj (t) + x /]T J 2 (l 



(13) 



and delayed correlation as 



c{At+\,t)=mAt)mi{t + l) + 



dx 
2^' 



t x 2 /2 



1 - tanh 2 \ ]T J^mfi) + x J 2 (l - m?(i)) 

j£9a V j 



(14) 



In above listed mean-field approximations, predictions of mj (t+ 1) and cf 3 -,(t+ 1, t) are made by using magnetizations 
and correlations at time t, this kind of reconstruction can be termed statistics based one-step reconstruction. In some 
special cases, e.g. fully asymmetric networks, observables at time t are sufficient to predict observables at time t + 1 
since effect of feedback loops are not present. But In general cases, to predict observables at time t + 1 one needs to 
use not only observables at time t but also at earlier time steps up to t = 0. 



IV. DYNAMICAL CAVITY RECONSTRUCTION 



Dynamical cavity method, originally proposed in [321 ]. together with dynamical replica analysis and generating 
functional analyses, are powerful tools in analysing dynamics of networks. The advantage of dynamical cavity method 
with respect to other two methods is that it can be applied on single instances to compute observables for every node. 

From eq. ([2]), we can write the probability of observing a path as: 



P(<r [0 ' T] ) = l[W{a(t+l)\<T(t)}P(cT(0)), 



(15) 



t=o 
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where P(er(0)) denotes initial probability distribution of spin configuration at time 0. It is difficult to study directly 
the full path distribution, so dynamical cavity method focuses on marginal probability of a path that spin i evolves 
from time to time t conditioning on the local fields 6^°'*': 

By applying Bethe approximation, one can derive following iterative cavity equations (we refer to literature [32| for 
details of derivations): 

[0,t-l] j£dl\k 

j 

t-i ^ t '[v. cr j, i /- 1 +e 1 '- 1 ] 
' JA 2 cosh /^E^,^^'- 1 + 0^^ } ' (17) 

where Pi-^erj '*' |#j '* 1 ' + JfciCr[ '' ^) is cavity message denoting (cavity) marginal probability of a path that spin i 
evolves from time to time t with its neighbor spin k removed from the graph. 

Above equation is exact on tree graphs and a good approximation on sparse graphs. After a fixed point of cavity 
equation is reached, observables e.g. marginal probabilities, magnetizations and correlations can be computed as 
functions of cavity messages. However, solving equation eq. (|T7|) is very time-consuming for large t because computa- 
tional complexity is proportional to 2 2t . So in practice one needs approximations to reduce computational complexity. 
Here we adopt the most simple approximation, one-time approximation[32:], which is also named time- factorization 
approximation [33| : 

t 

t'=0 

This approximation ignores correlations among different time steps before time t — 2, and makes summation over 
variables before time t — 2 possible. Then, from eq. (|17[) . one arrives at 

p l ^\j k y k - i )=j2 e n ^(^v^r^^HiE^^r 1 ]^^" 2 )' ( ig ) 

and expression of magnetizations and delayed correlations can be derived from above equation 



TO,- 



(20) 



(t + 1,*) = EE II KM" 1 ) P ^ O^l-W" 1 ) tanh [ j8 E Jik4) 4 (21) 



With mf (t + 1) and c^- (t + 1, t) obtained, a standard procedure to perform inference is using gradient descent to learn 
couplings from differences between experimental and predicted correlations, and learn external fields from differences 
between experimental and predicted correlations. 

d Ising = Qlsing + (22) 

J^ = fjACy, (23) 

where 

Am i = if;[m? ,ta (t)-m/(t)] (24) 
1 T 

ACy = jT^y E [^"C*. * - !) - (*« 1 !)] • ( 25 ) 

t=2 
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FIG. 1. Evolution of inference error AJ and difference of one-step delayed correlations of dynamical cavity inference as 
function of iterating time for networks with N = 100, C = 5.0, P s — and different coupling strength. Number of realizations 
used in experimental data R is 5 x 10 4 . Networks are fully asymmetric with J = 0.2,0.9 and 2.0 in top, middle and bottom 
panel respectively. 



Note that there are two ways to do the inference using magnetizations and delayed correlations, one way as used in 
this paper is to learn couplings by gradient descent, and the other one is to solve directly the couplings by matching 
exactly the experimental data, which usually requires inversion of a correlation-based matrix[23H24|. Usually gradient 
descent learning works slower than matrix inversion method and the convergence of learning method may depend on 
initial couplings and learning rate. However, in matrix inversion method, sometimes it is difficult to find a set of 
couplings and external fields that exactly matches the noisy experimental data, while the learning method is able to 
converge to a fixed point close to the true experimental data (as shown in fig. [I]). 



V. PERFORMANCE AND COMPARISON WITH MEAN-FIELD METHODS 



In this section we make comparative analysis between dynamical cavity method and mean-field methods on inverse 
kinetic Ising problem. Before we compare performance of those methods, the first thing we are interested in is the 
convergence properties of gradient descent learning used in our scheme. In fig. [T] we plot the evolution of average 
inference error at each time step 



<ij> 



and average difference of one-step delayed correlation 



AC =jmJ^( c ^ c r a ) 2 ( 27 ) 

y <a> 

given by dynamical inference in the gradient descent process. In fig. [T] we recorded evolution of A J and AC in three 
gradient descent process on the same network with different coupling strengths. We can see that in three panels of 
fig. [TJ all correlation differences converge in the learning, but evolutions of AJ are much different. In case of weak 
couplings(e.g. J = 0.2 at left panel), AJ decreases monotonously with difference of correlation till converges to the 
error level characterized by noise in experimental data. In case of J = 0.9, AJ decreases to a minimum value then 
start increasing and finally converges to a point larger than the minimum value. When J increases to 2.0, AJ keeps 
increasing after reaching the minimum point and finally goes beyond the initial error and diverges. Coupling strength 
plays a role of inverse temperature, a larger coupling strength gives equivalently low temperature in the model and 
dynamics of the model becomes more difficult to predict due to stronger correlations. Consequently, approximations 
we made in dynamical method turns worse with stronger couplings and results to worse convergence of gradient 
descent learning and larger inference error. 
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FIG. 2. Difference of magnetizations Am, one-step delayed correlations AC and inference error AJ given by naive mean-field, 
TAP, simply mean-field and dynamical cavity method(DC) of Ising model with Gaussian distributed couplings with zero mean 
and variance J. X axes denote coupling strength J. Size of the graph is N = 100, average degree is 5. Fraction of symmetric 
edges P s — in left 3 figures, P s = 0.5 in middle 3 figures and in the right ones P a — 1.0. Data are averaged over 10 realizations. 
Some data of inference errors are not present because of un-convergence of gradient descent learning. 



We have compared convergence properties of gradient descent learning with other approximations on the same 
network(data not shown), results show that with naive mean- field, TAP and simply mean- field approximations, A J 
stops convergence with couplings weaker than 0.9 because they give even worse approximations than dynamical cavity 
method on larger couplings (see also fig. [2]). However exact reconstruction does not have this converging problem as 
it does not use approximations so inference error only depends on quality of data(e.g. number of realizations used in 
computing experimental data), and is not a function of coupling strength. 

To evaluate performance of approximations, especially how it is influenced by coupling strength, one method is 
to compare accuracy of approximations in direct problem, e.g., by applying approximations on graphs with fixed 
couplings and external fields to compute magnetization and correlation and compare result with experimental data. 
Error in correlations are characterized by eq. (|2T[) and now AC is not a function of time t since couplings are fixed. 
Error of magnetization is defined in the same way by difference of magnetization given by approximations and that 
in the experimental data: 



Am = ■ 

Result are plotted in fig. [3J where graphs have same number of nodes and average connectivity but different degree 
of symmetry. Note that number of samplings used in computing experiment data is large enough, and noise in 
experimental data can be ignored compared with error made in approximations, so a smaller Am and A J indicates a 
lower error made by the approximation. Top and middle panels of fig.[2]show that in direct problem, all approximations 
work better in asymmetric networks than in symmetric networks. This is because in asymmetric networks, feedback 
correlations are not present so correlation ignored by approximations are fewer than in symmetric networks. Among 
all method, dynamical method works the best in this direct problem, which indicates that Bethe approximations 
is more accurate than mean-field approximations in diluted networks. More over, TAP method outperforms naive 
mean-field only with weak couplings, because the expansion is carried out with weak couplings, and it is more difficult 
to find a solution in eq. (fTTj) with stronger couplings. 

To evaluate performance of inference based on these approximations, we computed inference error A J given in eq. 
([55)1 by different approximations on same set of graphs, results are plotted in bottom panel of fig. f2J TAP works 
badly and stops converging on networks with very weak coupling strength, so I did not plot the inference error of 
TAP. The missing points in the figure indicate that with value of that coupling strength, gradient descent learning 
does not converge. Figures show that all methods perform worse when degree of symmetry increases, as in the direct 
problems. Dynamical cavity method converges with larger J and gives smaller inference error at same J than Naive 
mean-field and simply mean-field. Simply mean-field outperforms naive mean-field only with weak couplings and stops 
converging with weaker couplings. One interesting point is that, with degree of symmetry of network increases, the 
difference of AC and A J between dynamical cavity method and simply mean-field becomes larger while the difference 
between naive mean-field and dynamical cavity becomes smaller. 
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FIG. 3. Inference error of dynamical cavity inference(DC), naive mean-field and exact method for networks with different 
system size N. Average connectivity of network is 5, P s = 0, J = 1.5 and data are averaged over 10 realizations. 

We saw from fig.[T]and fig.[2]that, performances of dynamical cavity method is much worse than exact reconstruction. 
This is because due to computational expenses, networks we tested here are rather small, so loop effect can not be 
ignored on these networks (see e.g. [13]). If we increase the sparsity of the graph, the error of dynamical cavity 
method should be decreased. To illustrate at this point, in fig|3]we plot inference errors of dynamical cavity method 
for networks with same average connectivity but different system sizes. The figure shows that with N increasing, 
inference error approaches the error of exact inference. We expect that with N — > oo, inference error of dynamical 
cavity method gives same result to that of exact reconstruction in fully asymmetric networks. Note that in contrast 
to dynamical cavity method, mean-field method does not benefits from increasing system size because mean-field 
approximation does not become better with sparsity increases, and as a consequence, performance between dynamical 
cavity method and mean-field methods becomes larger for sparse networks with larger system sizes. 



VI. CONCLUSION AND DISCUSSION 

In this paper we presented how to use dynamical cavity method to infer kinetic Ising model, and compared the 
performance on sparse graphs with existing mean-field methods. Results show that in sparse graphs dynamical 
cavity method works better than other mean-field approximations and performance increases with sparsity of graph 
increases. In computing cavity marginals, we use the simplest one-time approximation to simplify the cavity equations, 
and statistical quantities at time t + 1 are computed from quantities at time t and time t — 1, which is different from 
mean-field approximations which do predictions by using only quantities at time t. We believe that dynamical cavity 
inference could benefit a lot from developing more efficient approximations in computing cavity equations to collect 
correlations between more time steps. The networks discussed in this paper have known topology. It is difficult to 
apply dynamical cavity method to networks with unknown structure since it is so expensive to run cavity equations 
on fully-connected networks. For inference problems which asks to reconstruct both network topology and coupling 
strengths, one can use another method e.g. simply mean-field to reconstruct the structure then refine the couplings 
by cavity method. 
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